A Preprocessing Framework and Approach for Web Applications

نویسندگان

  • Zhigang Zhang
  • Jin Chen
  • Xiaoming Li
چکیده

In the process of abstract extraction, two heuristics are used. First, the content of a page is organized semantically by paragraphs. Second, readers can quickly catch the ideas of the page with only a small quantity of sentences, and the sentences are generally located by key-words. Based on these heuristics, we can acquire an algorithm that will simulate the browsing manner of a reader to extract the abstract using the keywords obtained in the previous phase. The algorithm first divides the topic content into several paragraphs according to the tag tree, and then the sentences with high weights are picked from each paragraph to make up the abstract of the page. Following, two important subprocesses are illustrated in detail. 1. Identify the paragraphs of the content Fortunately, the structure of container tags describes the layout of HTML pages, which makes it possible to identify the paragraphs of the content. In the topic content tree, first locate the node that is the lowest common ancestor of all leaf nodes (we call it topic root). Topic root corresponds with the tag that exactly embeds all topic content nodes. Then, the son nodes of the topic root correspond with the paragraphs of the topic content. Figure 3 shows the process of identifying paragraphs. In (3) of Figure 3, each p block is a paragraph.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and Evaluation of a Method for Partitioning and Offloading Web-based Applications in Mobile Systems with Bandwidth Constraints

Computation offloading is known to be among the effective solutions of running heavy applications on smart mobile devices. However, irregular changes of a mobile data rate have direct impacts on code partitioning when offloading is in progress. It is believed that once a rate-adaptive partitioning performed, the replication of such substantial processes due to bandwidth fluctuation can be avoid...

متن کامل

بهینه‌سازی اجرا و پاسخ صفحات وب در فضای ابری با روش‌های پیش‌پردازش، مطالعه موردی سامانه‌های وارنیش و انجینکس

The response speed of Web pages is one of the necessities of information technology. In recent years, renowned companies such as Google and computer scientists focused on speeding up the web. Achievements such as Google Pagespeed, Nginx and varnish are the result of these researches. In Customer to Customer(C2C) business systems, such as chat systems, and in Business to Customer(B2C) systems, s...

متن کامل

A density based clustering approach to distinguish between web robot and human requests to a web server

Today world's dependence on the Internet and the emerging of Web 2.0 applications is significantly increasing the requirement of web robots crawling the sites to support services and technologies. Regardless of the advantages of robots, they may occupy the bandwidth and reduce the performance of web servers. Despite a variety of researches, there is no accurate method for classifying huge data ...

متن کامل

An Efficient Framework for Accurate Arterial Input Selection in DSC-MRI of Glioma Brain Tumors

Introduction: Automatic arterial input function (AIF) selection has an essential role in quantification of cerebral perfusion parameters. The purpose of this study is to develop an optimal automatic method for AIF determination in dynamic susceptibility contrast magnetic resonance imaging (DSC-MRI) of glioma brain tumors by using a new preprocessing method.Material and Methods: For this study, ...

متن کامل

طبقه‎بندی کاربردی کارکردهای عوامل نرم‎افزاری هوشمند و تطبیق آنها با ویژگی‎های وب‎سایت‎های کتابخانه‎های دیجیتال

Purpose: Web services are presently considered as technologies with highest number of applications for the purpose of providing the automatic, high-quality, and fast information interactions. The aim of this paper is therefore to provide a comprehensive framework for a collection of significant services offered by Farsi websites in libraries to be used in future designs. It also aims to classif...

متن کامل

An Algorithmic Approach to Data Preprocessing in Web Usage Mining

Web usage Mining is an area of web mining which deals with the extraction of interesting knowledge from logging information produced by web server. Different data mining techniques can be applied on web usage data to extract user access patterns and this knowledge can be used in variety of applications such as system improvement, web site modification, business intelligence etc. Web usage minin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Web Eng.

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2004